1 Summary

This analysis looks at data on skier chairlift rides. A few sample skier “days” for multiple seasons (05-06 to 14-15) are used to draw general conclusions about the capability of modeling in enhancing prediction of skier visits and chair lift rides.

The analysis provides potential benefit to three groups: 1. Area-management. Predictive models provide skier load anlaysis allowing better management of hill-side resources. Details of chair loading can be correlated to date, day, and time, weather conditions, snow conditions, etc. 2. Management and Users: Models can detect fraudulent pass use, protecting purchasers and area managment from fraud. 3. Area-users: Analytics can provide insights to skiers on their performance and possibly enable marketing and other offers to tailor and optimize their on-hill and off-hill experience.

The purpose is to explore the feasiblity and reliability with which skier visits can be predicted and analyzed for the benefit of the hill operator.

A secondary purpose is to provide advanced analystics to individual skiers and explore whether “fraud” (that is the use of a ski pass by other than the owner) can be detected from this data.

Key findings include:
1. Skier behavior is remarkably reproducile on a day to day basis.

2 Data Structure

All ski runs data for a sample of individual skiers are documented. Note that this sample is much smaller than the total data available to the ski lift operator (who in principle has it all).

2.1 Raw Data

Just to give a flavor for the data, here is a sample of ski runs for an individual skier. Note the sample spans multiple years.

time chair vertical_feet vertical_meters year month day time_of_day date pass
2014-04-11 14:20:04 Summit 1709 520 2014 4 11 14:20:04 2014-04-11 7b410007
2014-04-11 13:16:36 Summit 1709 520 2014 4 11 13:16:36 2014-04-11 7b410007
2014-04-11 12:09:55 Pine Marten 1367 416 2014 4 11 12:09:55 2014-04-11 7b410007
2013-04-12 12:08:23 Summit 1709 520 2013 4 12 12:08:23 2013-04-12 7b410007
2013-04-09 11:45:30 Northwest 2377 724 2013 4 9 11:45:30 2013-04-09 7b410007
2013-04-09 09:19:08 Pine Marten 1367 416 2013 4 9 09:19:08 2013-04-09 7b410007
2012-04-24 09:19:28 Pine Marten 1367 416 2012 4 24 09:19:28 2012-04-24 7b410007
2011-04-10 13:04:16 Pine Marten 1367 416 2011 4 10 13:04:16 2011-04-10 7b410007
2010-04-12 13:43:37 Pine Marten 1367 416 2010 4 12 13:43:37 2010-04-12 7b410007

2.2 Summarized for daily data

Each skier makes a few dozen to several hundred runs per season. A way to meaningfully summarize the data is to first read in all the data and looks at total number of days skied.

2.2.1 14-15

Here’s a representation of the skiers per day for the 14-15 Season.

    print(head(skier_day_count_14_15))
##      all_ski_days Freq week yday season_week     wday year season
## 1799   2015-03-01   80    9   59          18   Sunday 2015  14-15
## 1723   2014-12-13   51   50  346           7 Saturday 2014  14-15
## 1757   2015-01-16   49    3   15          11   Friday 2015  14-15
## 1821   2015-03-24   48   12   82          21  Tuesday 2015  14-15
## 1797   2015-02-27   43    9   57          17   Friday 2015  14-15
## 1730   2014-12-20   42   51  353           8 Saturday 2014  14-15
##      normal_Freq skier_cum
## 1799      1.0000    1.0000
## 1723      0.6375    1.6375
## 1757      0.6125    2.2500
## 1821      0.6000    2.8500
## 1797      0.5375    3.3875
## 1730      0.5250    3.9125

2.2.2 12-13 Season Comparison

Here’s a representation of the skiers per day for the 12-13 Season.

    #print(head(skier_day_count_12_13))
    
    
##      all_ski_days Freq week yday season_week     wday year season
## 1419   2013-02-24   79    8   54          17   Sunday 2013  12-13
## 1418   2013-02-23   71    8   53          17 Saturday 2013  12-13
## 1461   2013-04-07   65   14   96          23   Sunday 2013  12-13
## 1370   2013-01-06   63    1    5          10   Sunday 2013  12-13
## 1336   2012-12-02   60   49  336           5   Sunday 2012  12-13
## 1391   2013-01-27   60    4   26          13   Sunday 2013  12-13
##      normal_Freq skier_cum
## 1419   1.0000000  1.000000
## 1418   0.8987342  1.898734
## 1461   0.8227848  2.721519
## 1370   0.7974684  3.518987
## 1336   0.7594937  4.278481
## 1391   0.7594937  5.037975    

2.2.3 12-13 Season

Next, let’s explore looking at daily total runs and vertical feet.

2.2.4 Totals Runs Per Day

Here is a sample of the data for total runs per day per skier.

runs_per_day date pass week season_week wday mon year weekend holiday bizday season
19.00 2013-02-24 dc6e8366 8 13.00 0 1 2013 TRUE TRUE FALSE 12-13
12.00 2014-02-23 22f3ebd1 8 13.00 0 1 2014 TRUE TRUE FALSE 13-14
19.00 2013-02-03 22f3ebd1 5 10.00 0 1 2013 TRUE TRUE FALSE 12-13
14.00 2009-11-26 22f3ebd1 48 52.00 4 10 2009 FALSE TRUE FALSE 09-10
8.00 2015-04-09 cf87f389 15 19.00 4 3 2015 FALSE FALSE TRUE 14-15
6.00 2012-12-30 9e9db4ff 53 5.00 0 11 2012 TRUE TRUE FALSE 12-13
5.00 2013-03-16 4d129450 11 16.00 6 2 2013 TRUE TRUE FALSE 12-13
12.00 2011-02-06 4d129450 6 10.00 0 1 2011 TRUE TRUE FALSE 10-11
6.00 2009-11-27 4d129450 48 52.00 5 10 2009 FALSE FALSE TRUE 09-10

From random sample of data from 15 out of the 348 skiers a graph below show that runs per skier are fairly consistent from season to season.

2.2.5 Total Vertical Feet Per Day

We can also look at the number of vertical feet skied per day per skier. Again, most skiers seem to demonstrate fairly consistent behavior.

total_vertical date pass week season_week wday year weekend holiday bizday season
34569 2013-02-24 dc6e8366 8.00 13.00 0 2013 TRUE TRUE FALSE 12-13
17815 2014-02-23 22f3ebd1 8.00 13.00 0 2014 TRUE TRUE FALSE 13-14
26122 2013-02-03 22f3ebd1 5.00 10.00 0 2013 TRUE TRUE FALSE 12-13
18589 2009-11-26 22f3ebd1 48.00 52.00 4 2009 FALSE TRUE FALSE 09-10
11342 2015-04-09 cf87f389 15.00 19.00 4 2015 FALSE FALSE TRUE 14-15
11661 2012-12-30 9e9db4ff 53.00 5.00 0 2012 TRUE TRUE FALSE 12-13
5097 2013-03-16 4d129450 11.00 16.00 6 2013 TRUE TRUE FALSE 12-13
18837 2011-02-06 4d129450 6.00 10.00 0 2011 TRUE TRUE FALSE 10-11
5407 2009-11-27 4d129450 48.00 52.00 5 2009 FALSE FALSE TRUE 09-10

A random sample of 15 of the current 348 skiers is used. Below are the first few rows of the data. (NOTE - PASS NUMBERS HAVE BEEN HASHED TO PRESERVE ANONYMITY)

There appears to be pretty good distinction between individual skiers. We can also check the data from season to season as below. Again, it appears to provide a reasonably consistent picture.

3 Skier Speed Analytics

We need to add some more data to analysze skier speed. While the pass data tracks the vertical rise of the chairlift ride, the descent, which is the actual skiing part, is to the base of the next chairlift. Base elevations are not supplied with the downloaded data so need to be added. We can get some more data from the Master Development Plan and Google Maps

Ski run speed is affected by lots of factors, including the path chosen and terrain. To examine this we’ll look at the distrubution of runs split by chair lift and destination.

As might be expected, there’s a lot of variation in the speed. While some runs, such as Pine Marten to Summit, show almost uniform variation, the steeper runs returning to the same chairlift show the most data and most consisent variation.

Here’a a detailed look at a some chairlifts.

For the lift Pine Marten there are 4033 total runs for 15 skiers.

For the lift Summit there are 904 total runs for 12 skiers.

For the lift Northwest there are 1105 total runs for 15 skiers.

4 Chairlift Choice

Another preference individual skiers have is for specific chairlifts. Let’s see if this can contribute to behavioral classification. To do this we’ll adopt a simple model looking at charilift patterns.

## Using chair_list as id variables

As can be seen there are strong variations in the choice of chairlifts between skiers.

4.1 MLE for Ski days

set.seed(8675309+3)
## just get one pass
pass_sample<-"MBA6360970.csv"#sample(valid_pass, 1)

##calculate reference for rms
    skier_probs<-NULL
    chair_list<-rownames(look_up)
    chair_list<-gsub(" ", "_", chair_list)

    skier_probs<-as.data.frame(chair_list)

    ##get overall skier probs    
        pass<-pass_sample
        ## read in the pass data
        ski_runs<-NULL
    
        ski_runs<-read.csv(paste0(Directory, pass))
    
        ##fix Pine_Marten
        ski_runs$chair<-gsub(" ", "_", ski_runs$chair)
    
        ## suppress pass_id column
        ski_runs$pass_id<-NULL
        ## hash the pass identity
        pass_s<-digest(pass, algo="xxhash32")
        ski_runs$pass<-pass_s
    
        prob<-chair_probs(ski_runs$chair, chair_list)
    
        colnames(prob)<-pass_s
    
        ##handle exception on first pass
        if (!is.null(skier_probs)) {skier_probs <- cbind(skier_probs, prob)} else {skier_probs<-prob}
    ## get dates
        
        
        
    dates_list<-unique(ski_runs$date)
    
    day_probs<-NULL
    
    for (i_date in dates_list){
        
        #i_date<-dates_list[1]
        day_runs<-ski_runs[ski_runs$date==i_date,]
    
        prob<-chair_probs(day_runs$chair, chair_list)
        
        colnames(prob)<-i_date
    
        ##handle exception on first pass
        if (!is.null(day_probs)) {day_probs <- cbind(day_probs, prob)} else {day_probs<-prob}
    
        
    }

    t<-day_probs
    t<-t[,1:ncol(t)]-skier_probs[,2]
    
           colMin <- function(data) sapply(data, min, na.rm = TRUE)

    w<-colMin(t)      
    t<-sqrt(t*t)
    
        colMax <- function(data) sapply(data, max, na.rm = TRUE)
    
    y<-colMax(t)
    
    z<-colSums(t)
    #convert to long format
    u<-melt(t)
## No id variables; using all as measure variables
    v<-melt(y)
    z<-melt(w)
    u$date<-rownames(u)
    v$date<-rownames(v)
    z$date<-rownames(z)
    
    day_probs$chair<-rownames(day_probs)
    b<-melt(day_probs, id="chair")
    colnames(b)<-c("chair", "date", "prob")
    
    p<-ggplot(b, aes(x=date, y= prob, color = factor(chair)))
    p <- p + geom_point()
    print(p)

    q<- ggplot(u, aes(x=date, y = value))
    q<- q + geom_point(size = 3, color = "salmon")
    print(q)

    q<- ggplot(v, aes(x=date, y = value))
    q<- q + geom_point(size = 3, color = "darkgreen")
    print(q)

    q<- ggplot(z, aes(x=date, y = abs(value)))
    q<- q + geom_point(size = 2, color = "darkblue")
    print(q)

    # this is what ski_runs will look like
#                    time       chair vertical_feet vertical_meters year month day time_of_day       date     pass
# 1   2006-03-11 11:14:33 Pine_Marten          1367             416 2006     3  11    11:14:33 2006-03-11 a016b213
# 2   2006-03-11 11:27:59 Pine_Marten          1367             416 2006     3  11    11:27:59 2006-03-11 a016b213
# 3   2006-03-11 11:43:00 Pine_Marten          1367             416 2006     3  11    11:43:00 2006-03-11 a016b213
# 4   2006-03-11 11:43:00 Pine_Marten          1367             416 2006     3  11    11:43:00 2006-03-11 a016b213

The approach will be to take the averaged probabilities and use the MLE’s for a single skier to calculate the expected probability for each ski day for a large sample of skiers.
Here is a sample of skier probabilities (expressed as logarithms) for a specific passholder.

chair_list 35292c89
Sunrise 0.16
Rainbow 0.06
Skyliner 0.12
Summit 0.13
Sunshine 0.01
Pine_Marten 0.21
Red 0.00
Outback 0.12
Northwest 0.19

The table above shows the probabilities for 35292c89.

Here are some sample probabilities

date pass day_prob
2005-12-09 35292c89 0.18
2005-12-10 35292c89 0.16
2005-12-28 35292c89 0.14
2005-12-29 35292c89 0.14
2005-12-30 35292c89 0.15
2006-01-01 35292c89 0.16
2006-01-28 35292c89 0.15
2006-01-29 35292c89 0.17
2006-02-11 35292c89 0.12
2006-02-20 35292c89 0.14

4.1.1 Histogram

##         date     pass  day_prob color_code
## 1 2005-12-09 35292c89 0.1784988       TRUE
## 2 2005-12-10 35292c89 0.1559082       TRUE
## 3 2005-12-28 35292c89 0.1405858       TRUE
## 4 2005-12-29 35292c89 0.1351913       TRUE
## 5 2005-12-30 35292c89 0.1538793       TRUE
## 6 2006-01-01 35292c89 0.1612128       TRUE

4.2 Days of the Week